Goto

Collaborating Authors

 Middle School


Homework faces an existential crisis. Has AI made it pointless?

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. Homework faces an existential crisis. Has AI made it pointless? Students wait for a celebration of high test scores to begin at La Tijera Academy of Excellence in Inglewood on Wednesday. This is read by an automated voice.


Seeing the Big Picture: Evaluating Multimodal LLMs' Ability to Interpret and Grade Handwritten Student Work

Henkel, Owen, Roberts, Bill, Jaffe, Doug, Holt, Laurence

arXiv.org Artificial Intelligence

Recent advances in multimodal large language models (MLLMs) raise the question of their potential for grading, analyzing, and offering feedback on handwritten student classwork. This capability would be particularly beneficial in elementary and middle-school mathematics education, where most work remains handwritten, because seeing students' full working of a problem provides valuable insights into their learning processes, but is extremely time-consuming to grade. We present two experiments investigating MLLM performance on handwritten student mathematics classwork. Experiment A examines 288 handwritten responses from Ghanaian middle school students solving arithmetic problems with objective answers. In this context, models achieved near-human accuracy (95%, k = 0.90) but exhibited occasional errors that human educators would be unlikely to make. Experiment B evaluates 150 mathematical illustrations from American elementary students, where the drawings are the answer to the question. These tasks lack single objective answers and require sophisticated visual interpretation as well as pedagogical judgment in order to analyze and evaluate them. We attempted to separate MLLMs' visual capabilities from their pedagogical abilities by first asking them to grade the student illustrations directly, and then by augmenting the image with a detailed human description of the illustration. We found that when the models had to analyze the student illustrations directly, they struggled, achieving only k = 0.20 with ground truth scores, but when given human descriptions, their agreement levels improved dramatically to k = 0.47, which was in line with human-to-human agreement levels. This gap suggests MLLMs can "see" and interpret arithmetic work relatively well, but still struggle to "see" student mathematical illustrations.


Personalized Auto-Grading and Feedback System for Constructive Geometry Tasks Using Large Language Models on an Online Math Platform

Lee, Yong Oh, Bang, Byeonghun, Lee, Joohyun, Oh, Sejun

arXiv.org Artificial Intelligence

As personalized learning gains increasing attention in mathematics education, there is a growing demand for intelligent systems that can assess complex student responses and provide individualized feedback in real time. In this study, we present a personalized auto-grading and feedback system for constructive geometry tasks, developed using large language models (LLMs) and deployed on the Algeomath platform, a Korean online tool designed for interactive geometric constructions. The proposed system evaluates student-submitted geometric constructions by analyzing their procedural accuracy and conceptual understanding. It employs a prompt-based grading mechanism using GPT-4, where student answers and model solutions are compared through a few-shot learning approach. Feedback is generated based on teacher-authored examples built from anticipated student responses, and it dynamically adapts to the student's problem-solving history, allowing up to four iterative attempts per question. The system was piloted with 79 middle-school students, where LLM-generated grades and feedback were benchmarked against teacher judgments. Grading closely aligned with teachers, and feedback helped many students revise errors and complete multi-step geometry tasks. While short-term corrections were frequent, longer-term transfer effects were less clear. Overall, the study highlights the potential of LLMs to support scalable, teacher-aligned formative assessment in mathematics, while pointing to improvements needed in terminology handling and feedback design.


LearnLens: An AI-Enhanced Dashboard to Support Teachers in Open-Ended Classrooms

Srivastava, Namrata, Jain, Shruti, Cohn, Clayton, Mohammed, Naveeduddin, Timalsina, Umesh, Biswas, Gautam

arXiv.org Artificial Intelligence

Exploratory learning environments (ELEs), such as simulation-based platforms and open-ended science curricula, promote hands-on exploration and problem-solving but make it difficult for teachers to gain timely insights into students' conceptual understanding. This paper presents LearnLens, a generative AI (GenAI)-enhanced teacher-facing dashboard designed to support problem-based instruction in middle school science. LearnLens processes students' open-ended responses from digital assessments to provide various insights, including sample responses, word clouds, bar charts, and AI-generated summaries. These features elucidate students' thinking, enabling teachers to adjust their instruction based on emerging patterns of understanding. The dashboard was informed by teacher input during professional development sessions and implemented within a middle school Earth science curriculum. We report insights from teacher interviews that highlight the dashboard's usability and potential to guide teachers' instruction in the classroom.


Insights from Interviews with Teachers and Students on the Use of a Social Robot in Computer Science Class in Sixth Grade

Schenk, Ann-Sophie L., Schiffer, Stefan, Song, Heqiu

arXiv.org Artificial Intelligence

-- In this paper we report on first insights from interviews with teachers and students on using social robots in computer science class in sixth grade. Our focus is on learning about requirements and potential applications. We are particularly interested in getting both perspectives, the teachers' and the learners' view on how robots could be used and what features they should or should not have. Results show that teachers as well as students are very open to robots in the classroom. However, requirements are partially quite heterogeneous among the groups. This leads to complex design challenges which we discuss at the end of this paper . I. INTRODUCTION Robots have diverse applications across domains such as healthcare, industry, and education.


From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation

Zhou, Chengliang, Wang, Mei, Zhang, Ting, Zhu, Qiannan, Li, Jian, Huang, Hua

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities in mathematical problem-solving. However, the transition from providing answers to generating high-quality educational questions presents significant challenges that remain underexplored. To advance Educational Question Generation (EQG) and facilitate LLMs in generating pedagogically valuable and educationally effective questions, we introduce EQGBench, a comprehensive benchmark specifically designed for evaluating LLMs' performance in Chinese EQG. EQGBench establishes a five-dimensional evaluation framework supported by a dataset of 900 evaluation samples spanning three fundamental middle school disciplines: mathematics, physics, and chemistry. The dataset incorporates user queries with varying knowledge points, difficulty gradients, and question type specifications to simulate realistic educational scenarios. Through systematic evaluation of 46 mainstream large models, we reveal significant room for development in generating questions that reflect educational value and foster students' comprehensive abilities.


I Had a Huge Middle School Crush. So I Used a Controversial Technology to Help Me Talk to Her.

Slate

Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. In our eighth grade classroom, her name was Hanna. On AOL Instant Messenger, she was Banana3017. I was in love with both. At school, she was funny, and kind, and she had blue eyes that made my cheeks glow the same fiery color as her hair when she looked at me.


Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study

Thomas, Danielle R., Borchers, Conrad, Lin, Jionghao, Kakarla, Sanjit, Bhushan, Shambhavi, Gatz, Erin, Gupta, Shivang, Abboud, Ralph, Koedinger, Kenneth R.

arXiv.org Artificial Intelligence

Tutoring improves student achievement, but identifying and studying what tutoring actions are most associated with student learning at scale based on audio transcriptions is an open research problem. This present study investigates the feasibility and scalability of using generative AI to identify and evaluate specific tutor moves in real-life math tutoring. We analyze 50 randomly selected transcripts of college-student remote tutors assisting middle school students in mathematics. Using GPT-4, GPT-4o, GPT-4-turbo, Gemini-1.5-pro, and LearnLM, we assess tutors' application of two tutor skills: delivering effective praise and responding to student math errors. All models reliably detected relevant situations, for example, tutors providing praise to students (94-98% accuracy) and a student making a math error (82-88% accuracy) and effectively evaluated the tutors' adherence to tutoring best practices, aligning closely with human judgments (83-89% and 73-77%, respectively). We propose a cost-effective prompting strategy and discuss practical implications for using large language models to support scalable assessment in authentic settings. This work further contributes LLM prompts to support reproducibility and research in AI-supported learning.


I Teach Middle Schoolers. I'm Seeing Something in the Kids That's Getting Worse Every Year.

Slate

Good Job is Slate's advice column on work. Have a workplace problem big or small? I have been an eighth-grade teacher for seven years now and am beginning to think I made a terrible mistake in terms of choosing my profession. The kids I teach are rude and feral. They refuse to read or treat others with the slightest bit of decency, give up at the first sign of difficulty, and possess the attention span of goldfish.


Children's Mental Models of AI Reasoning: Implications for AI Literacy Education

Dangol, Aayushi, Wolfe, Robert, Zhao, Runhua, Kim, JaeWon, Ramanan, Trushaa, Davis, Katie, Kientz, Julie A.

arXiv.org Artificial Intelligence

As artificial intelligence (AI) advances in reasoning capabilities, most recently with the emergence of Large Reasoning Models (LRMs), understanding how children conceptualize AI's reasoning processes becomes critical for fostering AI literacy. While one of the "Five Big Ideas" in AI education highlights reasoning algorithms as central to AI decision-making, less is known about children's mental models in this area. Through a two-phase approach, consisting of a co-design session with 8 children followed by a field study with 106 children (grades 3-8), we identified three models of AI reasoning: Deductive, Inductive, and Inherent. Our findings reveal that younger children (grades 3-5) often attribute AI's reasoning to inherent intelligence, while older children (grades 6-8) recognize AI as a pattern recognizer. We highlight three tensions that surfaced in children's understanding of AI reasoning and conclude with implications for scaffolding AI curricula and designing explainable AI tools.